For the final assignment, we are using an R markdown to illustrate
our research.
DATA SELECTION
To perform an analysis on the trends in ridership recovery for SEPTA
bus stops in Philadelphia, the following data sources were used:
SEPTA Bus On-boarding data 2019 & 2023 SEPTA Bus station 2019
& 2023 IndeGo Bike-share station usage data 2019 & 2023
Philadelphia Census Tracts 2021 (Opendataphilly) ACS 2019
(get.acs/ Census Bureau) ACS 2022 (get.acs/ Census Bureau)
Number of Jobs in Philadelphia 2019 & 2023 (OnTheMap)
Loading SEPTA and IndeGo data
# SEPTA Bus data
ride <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/ridership.csv") %>%
select(-Avg_Off_2023, -Avg_Off_2019, -Off_Change) %>%
rename(GEOID = Census_Tract_ID)
ride19 <- ride %>%
select(GEOID, Avg_On_2019)
ride23 <- ride %>%
select(GEOID, Avg_On_2023)
#IndeGo Bike data
bike19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/2019 trips.csv") %>%
rename(GEOID = GEOID10,
Bike.Trips = Trips)
bike23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/2023 trips.csv") %>%
rename(GEOID = GEOID10,
Bike.Trips = Total.Trips) %>%
select(-Bike.trips, -Point_Count)
# SEPTA Bus stops
BusStop19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/busstop2019.csv") %>%
select(GEOID10, Point_Count) %>%
rename(GEOID = GEOID10,
BusStops = Point_Count)
BusStop23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/busstop2023.csv") %>%
select(GEOID10, Point_Count) %>%
rename(GEOID = GEOID10,
BusStops = Point_Count)
# Jobs
job19 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/jobdensity2019.csv")%>%
rename(GEOID = GEOID10)
job23 <- read.csv("D:/01UPENN/COURSES/CPLN5050_PlanningByNumbers/FinalProject/jobs2023.csv")%>%
rename(GEOID = GEOID10,
Jobs = sum_c000)
Loading Census tract data for 2019 and 2020
acs_vars <- c("B19001_001E", # Median Household Income
"B01001_001E", # Total population
"B01001_026E", # Total Female
"B01002_001E", # Median age by census tract
"B14001_001E", # Total population 3 years and over enrolled in school
"B23025_001E", # Total number of workers 16 years and over
"B08201_001E", # Total households
"B08201_002E", # No vehicle available
"B08301_003E", # Car, truck, or van - drove alone
"B08301_004E", # Car, truck, or van - carpooled
"B08301_012E", # Public transportation (excluding taxicab)
"B08301_013E", # Walked
"B08301_014E", # Other means
"B08301_015E", # Worked from home
"B03002_003E", # White alone
"B03002_004E", # Black or African American alone
"B03002_012E") # Hispanic or Latino
acsTractsPHL.2019 <- get_acs(geography = "tract",
year = 2019,
variables = acs_vars,
geometry = TRUE,
state = "PA",
county = "Philadelphia",
output = "wide") %>%
dplyr::select (GEOID, NAME, all_of(acs_vars)) %>%
rename (median_household_income = B19001_001E,
total_population = B01001_001E,
total_female = B01001_026E,
median_age = B01002_001E,
total_school_enrolled = B14001_001E,
total_workers_16_over = B23025_001E,
total_households = B08201_001E,
no_vehicle_available = B08201_002E,
drove_alone = B08301_003E,
carpooled = B08301_004E,
public_transport = B08301_012E,
walked = B08301_013E,
Othermode = B08301_014E,
WFH = B08301_015E,
white_alone = B03002_003E,
black_alone = B03002_004E,
hispanic_latino = B03002_012E)
acsTractsPHL.2022 <- get_acs(geography = "tract",
year = 2022,
variables = acs_vars,
geometry = TRUE,
state = "PA",
county = "Philadelphia",
output = "wide") %>%
dplyr::select (GEOID, NAME, all_of(acs_vars)) %>%
rename (median_household_income = B19001_001E,
total_population = B01001_001E,
total_female = B01001_026E,
median_age = B01002_001E,
total_school_enrolled = B14001_001E,
total_workers_16_over = B23025_001E,
total_households = B08201_001E,
no_vehicle_available = B08201_002E,
drove_alone = B08301_003E,
carpooled = B08301_004E,
public_transport = B08301_012E,
walked = B08301_013E,
Othermode = B08301_014E,
WFH = B08301_015E,
white_alone = B03002_003E,
black_alone = B03002_004E,
hispanic_latino = B03002_012E)
EXPLORATORY ANALYSIS
Cleaning up data
In this step, we created new variables for both years. The ACS
spatial data both years was first transformed to Pennsylvania North
State Plane to calculate the area for each census tract. The newly
created variables for our model are as follows: car: people who drove
alone or carpooled veh_own: Vehicle Ownership rate defined as the
percentage of households that own vehicles out of the total number of
households perc_female: the percentage of females in a census
track out of the total population perc_white: the percentage of
white population in a census tract out of the total population
perc_black: the percentage of black population in a census tract out of
the total population perc_hispanic: the percentage of hispanic or
latino population in a census track out of the total population
area: total geographical area of a census tract in square miles
pop_density: population per square miles
# 2019
acsTractsPHL.2019 <- acsTractsPHL.2019 %>%
st_transform("EPSG:2272")
acsTractsPHL.2019 <- acsTractsPHL.2019 %>%
mutate(Car = drove_alone+carpooled,
veh_own = (total_households- no_vehicle_available)/(total_households)*100,
perc_female = total_female/total_population,
perc_white = white_alone/total_population,
perc_black = black_alone/total_population,
perc_hispanic = hispanic_latino/total_population,
perc_school_enrolled = total_school_enrolled/total_population,
area = as.numeric(st_area(acsTractsPHL.2019$geometry)),
area = area/27878400, #square mile
pop_density = total_population/area) %>%
select(-drove_alone, -carpooled, -total_households, -no_vehicle_available, -white_alone, -black_alone, -total_female, -total_school_enrolled, -hispanic_latino)
# Vehicle Ownership Rate 2019
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own <= 57.84] <- "Q1"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 57.84 & acsTractsPHL.2019$veh_own <= 69.33] <- "Q2"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 69.33 & acsTractsPHL.2019$veh_own <= 82.55] <- "Q3"
acsTractsPHL.2019$veh_own_ca[acsTractsPHL.2019$veh_own > 82.55] <- "Q4"
acsTractsPHL.2019$veh_own_ca <- as.factor(acsTractsPHL.2019$veh_own_ca)
# Income Cat 2019
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income <= 1162] <- "Q1"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1162 & acsTractsPHL.2019$median_household_income <= 1556] <- "Q2"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1556 & acsTractsPHL.2019$median_household_income <= 1990] <- "Q3"
acsTractsPHL.2019$income_ca[acsTractsPHL.2019$median_household_income > 1990] <- "Q4"
acsTractsPHL.2019$income_ca <- as.factor(acsTractsPHL.2019$income_ca)
# 2022
acsTractsPHL.2022 <- acsTractsPHL.2022 %>%
st_transform("EPSG:2272")
acsTractsPHL.2022 <- acsTractsPHL.2022 %>%
mutate(Car = drove_alone+carpooled,
veh_own = (total_households- no_vehicle_available)/(total_households)*100,
perc_female = total_female/total_population,
perc_white = white_alone/total_population,
perc_black = black_alone/total_population,
perc_hispanic = hispanic_latino/total_population,
perc_school_enrolled = total_school_enrolled/total_population,
area = as.numeric(st_area(acsTractsPHL.2022$geometry)),
area = area/27878400, #square mile
pop_density = total_population/area) %>%
select(-drove_alone, -carpooled, -total_households, -no_vehicle_available, -white_alone, -black_alone, -total_female, -total_school_enrolled, -hispanic_latino)
# Vehicle Ownership Rate 2022
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own <= 60.81] <- "Q1"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 60.81 & acsTractsPHL.2022$veh_own <= 73.89] <- "Q2"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 73.89 & acsTractsPHL.2022$veh_own <= 85.45] <- "Q3"
acsTractsPHL.2022$veh_own_ca[acsTractsPHL.2022$veh_own > 85.45] <- "Q4"
acsTractsPHL.2022$veh_own_ca <- as.factor(acsTractsPHL.2022$veh_own_ca)
# Income Cat 2022
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income <= 1162] <- "Q1"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1162 & acsTractsPHL.2022$median_household_income <= 1556] <- "Q2"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1556 & acsTractsPHL.2022$median_household_income <= 1990] <- "Q3"
acsTractsPHL.2022$income_ca[acsTractsPHL.2022$median_household_income > 1990] <- "Q4"
acsTractsPHL.2022$income_ca <- as.factor(acsTractsPHL.2022$income_ca)
Once the ACS data was cleaned, it was merged with the bus ridership,
bike ridership, total number of bus stops and total number of jobs data
sets for each census tract and year. A binary variable called ‘BikeStop’
indicating the presence of Bike-share stations in each census tract was
created. Job density indicating the concentration of jobs within a tract
was also created. All outliers and NAs were removed at this stage.
# merge 2019
dat19 <- merge(acsTractsPHL.2019, ride19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat19 <- merge(dat19, bike19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat19 <- merge(dat19, BusStop19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat19 <- merge(dat19, job19, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
# IndeGo or Not 2019
dat19$BikeStop <- ifelse(dat19$Bike.Trips == 0, 0, 1)
# Job density 2019
dat19$job_density <- dat19$Jobs/dat19$area
# removing zeros and NAs
dat19 <- dat19 %>%
filter(Avg_On_2019 != 0) %>%
filter(median_household_income != 0)
# merge 2019
dat23 <- merge(acsTractsPHL.2022, ride23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat23 <- merge(dat23, bike23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat23 <- merge(dat23, BusStop23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
dat23 <- merge(dat23, job23, by = "GEOID", all.x = FALSE, all.y=FALSE, sort
= FALSE)
# IndeGo or Not 2022
dat23$BikeStop <- ifelse(dat23$Bike.Trips == 0, 0, 1)
dat23 <- dat23 %>% na.omit()
# Job density 2022
dat23$job_density <- dat23$Jobs/dat23$area
Visualizing Change in Ridership
We used T-map to visualize the shift in ridership post-covid. Average
On-boarding for tracts was categorized into 5 categories to maintain
uniformity across the years. As seen from the two maps, ridership has
fallen on the whole in Philadelphia. However, there are some tracts
where it was stayed the same or improved. While in our research we are
not looking at changes across geographies or spatial auto-correlation,
it helps to visualize the trends in our dependent variable.
# factor levels
factor_levels <- c("0-250", "251-500", "501-750", "750-1000", "> 1000")
# Avg_On_2019_ca
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 <= 250] <- "0-250"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 250 & dat19$Avg_On_2019 <= 500] <- "251-500"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 500 & dat19$Avg_On_2019 <= 750] <- "501-750"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 750 & dat19$Avg_On_2019 <= 1000] <- "750-1000"
dat19$Avg_On_2019_ca[dat19$Avg_On_2019 > 1000] <- "> 1000"
# Assign factor levels to the column
dat19$Avg_On_2019_ca <- factor(dat19$Avg_On_2019_ca, levels = factor_levels)
# Avg_On_2023_ca
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 <= 250] <- "0-250"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 250 & dat23$Avg_On_2023 <= 500] <- "251-500"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 500 & dat23$Avg_On_2023 <= 750] <- "501-750"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 750 & dat23$Avg_On_2023 <= 1000] <- "750-1000"
dat23$Avg_On_2023_ca[dat23$Avg_On_2023 > 1000] <- "> 1000"
# Assign factor levels to the column
dat23$Avg_On_2023_ca <- factor(dat23$Avg_On_2023_ca, levels = factor_levels)
# Total Bus Boardings, Philadelphia 2019
tmap1 <- tm_shape(dat19) +
tm_polygons(fill = "Avg_On_2019_ca",
palette = "Blues", title = "Total Bus Boardings,
Philadelphia 2019") +
tm_layout(legend.position = c("right", "bottom"))
# Total Bus Boardings, Philadelphia 2023
tmap2 <- tm_shape(dat23) +
tm_polygons(fill = "Avg_On_2023_ca",
palette = "Blues", title = "Total Bus Boardings,
Philadelphia 2023") +
tm_layout(legend.position = c("right", "bottom"))
tmap_arrange(tmap1, tmap2)

Summary Statistic
As explained previously, we log transformed the dependent variable.
In our finalized markdown, that step is included in this code chunk to
be included in the summary stat. Here we have provided the list of
variables as well as their count. Mean and median have been provided to
indicate the average value and centrality in data while standard
deviation indicates dispersion or skewness of data.
# log transform
dat19$log.Avg_On_2019 <- log(dat19$Avg_On_2019)
dat23$log.Avg_On_2023 <- log(dat23$Avg_On_2023)
# tables
table <- dat19 %>%
st_drop_geometry() %>%
select(-GEOID, -NAME) %>%
format(scientific = FALSE) %>%
describe(skew = F) %>%
dplyr::select(n, mean, sd, min, max) %>%
kable(caption = "Summary Statistics for Philadelphia, 2019") %>%
kable_minimal()
table
Summary Statistics for Philadelphia, 2019
|
|
n
|
mean
|
sd
|
min
|
max
|
|
median_household_income*
|
355
|
164.059155
|
93.3756728
|
1
|
326
|
|
total_population*
|
355
|
169.842253
|
96.2516001
|
1
|
338
|
|
median_age*
|
355
|
87.014084
|
45.6847132
|
1
|
181
|
|
total_workers_16_over*
|
355
|
171.585915
|
98.0888294
|
1
|
342
|
|
public_transport*
|
355
|
63.104225
|
50.0361210
|
1
|
172
|
|
walked*
|
355
|
36.974648
|
36.8792032
|
1
|
129
|
|
Othermode*
|
355
|
6.076056
|
11.5698501
|
1
|
53
|
|
WFH*
|
355
|
1.042253
|
0.3918888
|
1
|
6
|
|
Car*
|
355
|
160.763380
|
94.2797874
|
1
|
327
|
|
veh_own*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
perc_female*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
perc_white*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
perc_black*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
perc_hispanic*
|
355
|
174.028169
|
102.5758603
|
1
|
351
|
|
perc_school_enrolled*
|
355
|
177.743662
|
102.2006401
|
1
|
342
|
|
area*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
pop_density*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
veh_own_ca*
|
355
|
2.498591
|
1.1208720
|
1
|
4
|
|
income_ca*
|
355
|
2.487324
|
1.1233185
|
1
|
4
|
|
Avg_On_2019*
|
355
|
172.433803
|
100.1557995
|
1
|
346
|
|
Bike.Trips*
|
355
|
6.859155
|
14.7725656
|
1
|
65
|
|
BusStops*
|
355
|
23.980282
|
14.9232369
|
1
|
67
|
|
OID_*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
Jobs*
|
355
|
147.402817
|
89.7892604
|
1
|
312
|
|
BikeStop*
|
355
|
1.180282
|
0.3849645
|
1
|
2
|
|
job_density*
|
355
|
178.000000
|
102.6239088
|
1
|
355
|
|
Avg_On_2019_ca*
|
355
|
2.709859
|
1.3727735
|
1
|
5
|
|
log.Avg_On_2019*
|
355
|
172.433803
|
100.1557995
|
1
|
346
|
table1 <- dat23 %>%
st_drop_geometry() %>%
select(-GEOID, -NAME) %>%
format(scientific = FALSE) %>%
describe(skew = F) %>%
dplyr::select(n, mean, sd, min, max) %>%
kable(caption = "Summary Statistics for Philadelphia, 2022-23") %>%
kable_minimal()
table1
Summary Statistics for Philadelphia, 2022-23
|
|
n
|
mean
|
sd
|
min
|
max
|
|
median_household_income*
|
356
|
165.896067
|
94.6207129
|
1
|
331
|
|
total_population*
|
356
|
173.789326
|
99.5428027
|
1
|
346
|
|
median_age*
|
356
|
88.924157
|
47.4306893
|
1
|
188
|
|
total_workers_16_over*
|
356
|
170.216292
|
96.4030323
|
1
|
338
|
|
public_transport*
|
356
|
58.851124
|
51.3521701
|
1
|
173
|
|
walked*
|
356
|
25.092697
|
30.9053048
|
1
|
109
|
|
Othermode*
|
356
|
6.952247
|
12.7866775
|
1
|
57
|
|
WFH*
|
356
|
1.002809
|
0.0529999
|
1
|
2
|
|
Car*
|
356
|
160.609551
|
92.0886702
|
1
|
324
|
|
veh_own*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
perc_female*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
perc_white*
|
356
|
177.502809
|
102.9077393
|
1
|
355
|
|
perc_black*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
perc_hispanic*
|
356
|
169.626405
|
102.7008415
|
1
|
347
|
|
perc_school_enrolled*
|
356
|
178.070225
|
102.2127200
|
1
|
339
|
|
area*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
pop_density*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
veh_own_ca*
|
356
|
2.500000
|
1.1196076
|
1
|
4
|
|
income_ca*
|
356
|
2.752809
|
1.0802284
|
1
|
4
|
|
Avg_On_2023*
|
356
|
172.087079
|
99.4341781
|
1
|
345
|
|
Bike.Trips*
|
356
|
14.449438
|
26.2798638
|
1
|
98
|
|
BusStops*
|
356
|
23.896067
|
14.9573247
|
1
|
67
|
|
OID_*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
Jobs*
|
356
|
154.404494
|
92.0692904
|
1
|
319
|
|
BikeStop*
|
356
|
1.278090
|
0.4486885
|
1
|
2
|
|
job_density*
|
356
|
178.500000
|
102.9125843
|
1
|
356
|
|
Avg_On_2023_ca*
|
356
|
2.769663
|
1.2346093
|
1
|
5
|
|
log.Avg_On_2023*
|
356
|
172.087079
|
99.4341781
|
1
|
345
|
Histogram 2019
options(scipen = 999)
a0 <- ggplot(dat19, aes(x = log.Avg_On_2019)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Log Average On-boarding",
title = "Distribution of Log Average On-boarding by Cenus Tract",
caption = "Figure 0")+ theme_minimal()+
theme(text = element_text(size = 8))
a1 <- ggplot(dat19, aes(x = median_household_income)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Median Household Income",
title = "Distribution of Median Household Income",
caption = "Figure 1")+ theme_minimal()+
theme(text = element_text(size = 8))
a2 <- ggplot(dat19, aes(x = perc_school_enrolled)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of population enrolled in school",
title = "Distribution of population enrolled in school",
caption = "Figure 2")+ theme_minimal()+
theme(text = element_text(size = 8))
a3 <- ggplot(dat19, aes(x = total_workers_16_over)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Total Number of Workers over 16 years",
title = "Distribution Total Number of Workers over 16 years",
caption = "Figure 3")+ theme_minimal()+
theme(text = element_text(size = 8))
a4 <- ggplot(dat19, aes(x = median_age)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Median Age Census Tract",
title = "Distribution of Median Age by Census Tract",
caption = "Figure 4")+ theme_minimal()+
theme(text = element_text(size = 8))
a5 <- ggplot(dat19, aes(x = pop_density*100)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Population density",
title = "Distribution of Population across Census Tracts",
caption = "Figure 5")+ theme_minimal()+
theme(text = element_text(size = 8))
a6 <- ggplot(dat19, aes(x = log(job_density))) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Job density",
title = "Distribution of Jobs across Census Tracts",
caption = "Figure 6")+ theme_minimal()+
theme(text = element_text(size = 8))
a7 <- ggplot(dat19, aes(x = veh_own)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Vehicle Ownership Rate",
title = "Vehicle Ownership Rate by Census Tract",
caption = "Figure 7")+ theme_minimal()+
theme(text = element_text(size = 8))
a8 <- ggplot(dat19, aes(x = perc_female)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of Females",
title = "Percentage of Females across Census Tract",
caption = "Figure 8")+ theme_minimal()+
theme(text = element_text(size = 8))
a9 <- ggplot(dat19, aes(x = perc_black)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of Black Residents",
title = "Percentage of Black Residents by Census Tract",
caption = "Figure 9")+ theme_minimal()+
theme(text = element_text(size = 8))
a10 <- ggplot(dat19, aes(x = public_transport)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Mode of transportation to work: Public Transit",
title = "Mode of transportation to work: Public Transit",
caption = "Figure 10")+ theme_minimal()+
theme(text = element_text(size = 8))
a11 <- ggplot(dat19, aes(x = Othermode)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Mode of transportation to work: Others",
title = "Mode of transportation to work: Others",
caption = "Figure 11")+ theme_minimal()+
theme(text = element_text(size = 8))
a12 <- ggplot(dat19, aes(x = BusStops)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Number of Bus Stops",
title = "Total Number of Bus Stops by Census Tract",
caption = "Figure 12")+ theme_minimal()+
theme(text = element_text(size = 8))
a13 <- ggplot(dat19, aes(x = log(Bike.Trips))) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Number of IndeGo Bike Trips",
title = "Total Number of IndeGo Bike Trips by Census Tract",
caption = "Figure 13")+ theme_minimal()+
theme(text = element_text(size = 8))
a14 <- ggplot(dat19, aes(x = BikeStop)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Presence of IndeGo BikeShare Station",
title = "Presence of IndeGo BikeShare Station",
caption = "Figure 14")+ theme_minimal()+
theme(text = element_text(size = 8))
grid.arrange(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, ncol = 3)

Histogram 2023
options(scipen = 999)
a0 <- ggplot(dat23, aes(x = log.Avg_On_2023)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Log Average On-boarding",
title = "Distribution of Log Average On-boarding by Cenus Tract",
caption = "Figure 0")+ theme_minimal()+
theme(text = element_text(size = 8))
a1 <- ggplot(dat23, aes(x = median_household_income)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Median Household Income",
title = "Distribution of Median Household Income",
caption = "Figure 1")+ theme_minimal()+
theme(text = element_text(size = 8))
a2 <- ggplot(dat23, aes(x = perc_school_enrolled)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of population enrolled in school",
title = "Distribution of population enrolled in school",
caption = "Figure 2")+ theme_minimal()+
theme(text = element_text(size = 8))
a3 <- ggplot(dat23, aes(x = total_workers_16_over)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Total Number of Workers over 16 years",
title = "Distribution Total Number of Workers over 16 years",
caption = "Figure 3")+ theme_minimal()+
theme(text = element_text(size = 8))
a4 <- ggplot(dat23, aes(x = median_age)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Median Age Census Tract",
title = "Distribution of Median Age by Census Tract",
caption = "Figure 4")+ theme_minimal()+
theme(text = element_text(size = 8))
a5 <- ggplot(dat23, aes(x = pop_density*100)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Population density",
title = "Distribution of Population across Census Tracts",
caption = "Figure 5")+ theme_minimal()+
theme(text = element_text(size = 8))
a6 <- ggplot(dat23, aes(x = log(job_density))) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Job density",
title = "Distribution of Jobs across Census Tracts",
caption = "Figure 6")+ theme_minimal()+
theme(text = element_text(size = 8))
a7 <- ggplot(dat23, aes(x = veh_own)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Vehicle Ownership Rate",
title = "Vehicle Ownership Rate by Census Tract",
caption = "Figure 7")+ theme_minimal()+
theme(text = element_text(size = 8))
a8 <- ggplot(dat23, aes(x = perc_female)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of Females",
title = "Percentage of Females across Census Tract",
caption = "Figure 8")+ theme_minimal()+
theme(text = element_text(size = 8))
a9 <- ggplot(dat23, aes(x = perc_black)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Percentage of Black Residents",
title = "Percentage of Black Residents by Census Tract",
caption = "Figure 9")+ theme_minimal()+
theme(text = element_text(size = 8))
a10 <- ggplot(dat23, aes(x = public_transport)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Mode of transportation to work: Public Transit",
title = "Mode of transportation to work: Public Transit",
caption = "Figure 10")+ theme_minimal()+
theme(text = element_text(size = 8))
a11 <- ggplot(dat23, aes(x = Othermode)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Mode of transportation to work: Others",
title = "Mode of transportation to work: Others",
caption = "Figure 11")+ theme_minimal()+
theme(text = element_text(size = 8))
a12 <- ggplot(dat23, aes(x = BusStops)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Number of Bus Stops",
title = "Total Number of Bus Stops by Census Tract",
caption = "Figure 12")+ theme_minimal()+
theme(text = element_text(size = 8))
a13 <- ggplot(dat23, aes(x = log(Bike.Trips))) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Number of IndeGo Bike Trips",
title = "Total Number of IndeGo Bike Trips by Census Tract",
caption = "Figure 13")+ theme_minimal()+
theme(text = element_text(size = 8))
a14 <- ggplot(dat23, aes(x = BikeStop)) +
geom_histogram(fill = "blue", color = "black")+
labs(x = "Presence of IndeGo BikeShare Station",
title = "Presence of IndeGo BikeShare Station",
caption = "Figure 14")+ theme_minimal()+
theme(text = element_text(size = 8))
grid.arrange(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, ncol = 3)

Findings
The histograms suggest that while some variables related to
socioeconomic status and urban infrastructure have remained stable,
there are noticeable changes in transportation modes and infrastructure
developments, particularly in bike-sharing, which may have impacted bus
ridership post-COVID. For the dependent variable, the total onboarding
decreased in 2023 for almost all census tracts compared to 2019.
Moreover, for transportation, there has been a notable shift in the mode
of transportation to work. Public transit usage appears to have been
slightly reduced in 2023, which could be a direct impact of the
pandemic. The vehicle ownership rates appear similar, suggesting stable
car ownership rates. Also, The total number of bus stops by census tract
shows some increase in certain tracts in 2023, which could be an attempt
to enhance public transport accessibility. There is a clear increase in
the number of Indego Bike trips, indicating a rise in the use of
bike-sharing systems, possibly due to changes in mobility preferences
post-COVID. The presence of Indego Bike-share stations shows a dramatic
increase, highlighting significant investment in this
infrastructure.
Correlation Scatterplot 2019
options(scipen = 999)
p1 <- ggplot(dat19, aes(x= median_household_income, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Median Household Income",
caption = "Figure 1")+theme(text = element_text(size = 8))
p2 <- ggplot(dat19, aes(x= total_workers_16_over, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Total Number of Workers over 16 years",
caption = "Figure 2")+theme(text = element_text(size = 8))
p3 <- ggplot(dat19, aes(x= median_age, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Median Age",
caption = "Figure 3")+theme(text = element_text(size = 8))
p4 <- ggplot(dat19, aes(x= pop_density*100, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Population density",
caption = "Figure 4")+theme(text = element_text(size = 8))
p5 <- ggplot(dat19, aes(x= log(job_density), y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Log Job density",
caption = "Figure 5")+theme(text = element_text(size = 8))
p6 <- ggplot(dat19, aes(x= veh_own, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Vehicle Ownership Rate",
caption = "Figure 6")+theme(text = element_text(size = 8))
p7 <- ggplot(dat19, aes(x= perc_female, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Percentage of Females",
caption = "Figure 7")+theme(text = element_text(size = 8))
p8 <- ggplot(dat19, aes(x= perc_black, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Percentage of Black Residents",
caption = "Figure 8")+theme(text = element_text(size = 8))
p9 <- ggplot(dat19, aes(x= public_transport, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Public Transit",
caption = "Figure 9")+theme(text = element_text(size = 8))
p10 <- ggplot(dat19, aes(x= Othermode, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Others",
caption = "Figure 10")+theme(text = element_text(size = 8))
p11 <- ggplot(dat19, aes(x= Car, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Car",
caption = "Figure 11")+theme(text = element_text(size = 8))
p12 <- ggplot(dat19, aes(x= BusStops, y =log.Avg_On_2019))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Number of Bus Stops",
caption = "Figure 12")+theme(text = element_text(size = 8))
grid.arrange(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, ncol = 3)

Correlation Scatterplot 2023
options(scipen = 999)
p1 <- ggplot(dat23, aes(x= median_household_income, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Median Household Income",
caption = "Figure 1")+theme(text = element_text(size = 8))
p2 <- ggplot(dat23, aes(x= total_workers_16_over, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Total Number of Workers over 16 years",
caption = "Figure 2")+theme(text = element_text(size = 8))
p3 <- ggplot(dat23, aes(x= median_age, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Median Age",
caption = "Figure 3")+theme(text = element_text(size = 8))
p4 <- ggplot(dat23, aes(x= pop_density*100, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Population density",
caption = "Figure 4")+theme(text = element_text(size = 8))
p5 <- ggplot(dat23, aes(x= log(job_density), y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Log Job density",
caption = "Figure 5")+theme(text = element_text(size = 8))
p6 <- ggplot(dat23, aes(x= veh_own, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Vehicle Ownership Rate",
caption = "Figure 6")+theme(text = element_text(size = 8))
p7 <- ggplot(dat23, aes(x= Bike.Trips, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Number of BikeShare Trips",
caption = "Figure 7")+theme(text = element_text(size = 8))
p8 <- ggplot(dat23, aes(x= perc_black, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Percentage of Black Residents",
caption = "Figure 8")+theme(text = element_text(size = 8))
p9 <- ggplot(dat23, aes(x= public_transport, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Public Transit",
caption = "Figure 9")+theme(text = element_text(size = 8))
p10 <- ggplot(dat23, aes(x= Othermode, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Others",
caption = "Figure 10")+theme(text = element_text(size = 8))
p11 <- ggplot(dat23, aes(x= Car, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Mode of transportation to work: Car",
caption = "Figure 11")+theme(text = element_text(size = 8))
p12 <- ggplot(dat23, aes(x= BusStops, y =log.Avg_On_2023))+
geom_point(color = "blue")+theme_minimal()+geom_smooth(method="lm", se = FALSE)+
labs(x = "Number of Bus Stops",
caption = "Figure 12")+theme(text = element_text(size = 8))
grid.arrange(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, ncol = 3)

Findings
Scatter plots between a logged dependent variable and an independent
variable were created to help visualize their relationship. The
relationship between log-on-boardings and several independent variables
such as median household income, total number of workers over 16 years,
median age, percentage of black residents, car, and number of bus stops
stayed more or less constant between the two years. The log-on-boardings
vs. Population density graph shows a steeper incline for the year 2023,
suggesting that bus ridership has gone down in some areas with lower
population density but it has increased in one particular census tract,
which is an outlier. The ridership increases as job density increases,
with most lying around 1808 jobs per census tract; however, the total
job density has also decreased. Vehicle Ownership rate also shows a
consistent trend wherein ridership nosedives as the number of vehicles
in a household increases. In 2019, ridership improves when the
percentage of females in a census tract increases. In census tracts
where more people take transit to work, the relation between total bus
ridership and public transit to work has also weakened in the year 2023,
looking at the two scatterplots, although it is still an upward trend.
Whereas for other transit modes (as reported in ACS), the downward trend
has stayed the same.
Correlation Matrix 2019
# 2019
numericVars <- dat19 %>%
st_drop_geometry() %>%
select_if(is.numeric) %>%
na.omit()
# Calculate correlation matrix
correlation_matrix <- cor(numericVars, method = "pearson")
# Create correlation plot
numericVars %>%
correlate() %>%
autoplot(type = "upper", text_size = 3) +
geom_text(aes(label = round(r,digits=2)),size = 3)

Correlation Matrix 2023
# 2023
numericVars1 <- dat23 %>%
st_drop_geometry() %>%
select_if(is.numeric) %>%
na.omit()
# Calculate correlation matrix
correlation_matrix <- cor(numericVars1, method = "pearson")
# Create correlation plot
numericVars1 %>%
correlate() %>%
autoplot(type = "upper", text_size = 3) +
geom_text(aes(label = round(r,digits=2)),size = 3)

Findings
A correlation matrix was created for the two years before the
regression was run to determine the strength and direction of linear
relationships between pairs of variables. This was done to avoid
potential multicollinearity issues in regression. High collinearity
(close to 1 or -1) was observed between the following pairs of
independent variables:
Total population, total workers over 16 years, and median
household income, 2019 and 2023
Usage of a car as mode of transport to work and vehicle
ownership, total population, total workers over 16 years, and median
household income, 2019 and 2023
Percentage white and percentage black residents, 2019 and 2023
Derived variables and parent variable
Average on-boarding and log average onboarding, 2019 and
2023
Area and population density, 2019 and 2023
Jobs and job density, 2019 and 2023
Bike stops and bike trips, 2019
In this case, none of the variables was highly correlated with the
dependent variable. Due to redundancy, variables with high
intercorrelations were eliminated from the final regression model.
IMPLICATIONS
According to our findings, there are four important implications for
SEPTA’s bus ridership recovery from COVID-19. The focus includes vehicle
ownership and urban transit policy, the impact of demographic changes on
transit use, the role of job density in supporting bus transit, and
integration with Other modes of transport.
First, since the increase in bus ridership in areas with lower
vehicle ownership highlights a critical dependency on public transport
where private vehicle access is limited, policies that support and
expand public transit accessibility in lower vehicle ownership areas
could address transportation equity. SEPTA would be the major influencer
to expand the level of the service for bus routes that pass
neighborhoods with lower vehicle ownership rates. At the same time,
upgrading the SEPTA bus stops and other infrastructure is also important
to increase the accessibility for buses. At the same time, SEPTA can
also encourage public transport use even in higher vehicle ownership
areas through incentives to balance the usage and reduce congestion and
environmental impact.
Second, transit authorities like OTIS or SEPTA should consider the
needs and perceptions of older adults in their service designs. Because
the shift in the relationship between median age and bus ridership pre-
and post-COVID suggests changes in transit use preferences among
different age groups, particularly older populations, the transportation
department would make the bus environment safe and sanitary to encourage
older people to use bus services. Since our model for median age lacks
significant association with bus ridership, further research can
Investigate the long-term impacts of the pandemic on transit use habits
among different ages.
Third, it is necessary for transportation planners to consider
integrating public transit planning with the development of new
employment hubs or the revitalization of existing ones. Based on the
significant and consistent positive effect of job density on bus
ridership, it is important to apply TOD to establish tight connections
between public transit and employment hubs. Moreover, government and
planners could also revitalize existing employment hubs by increasing
the accessibility to bus transit to ensure robust transit links to job
centers can enhance ridership and support economic activity.
Fourth, policymakers could promote integrated transport networks like
transfers between modes, enhancing overall urban mobility efficiency.
Our findings suggest the evolving relationship between bus ridership and
bike-sharing systems from competition to coexistence and the importance
of other commuting modes. Several Policies can involve infrastructure
investments like better bike racks on buses and timed connections
between different modes. Furthermore, future research should explore the
dynamics of multimodal transport use patterns, particularly how these
patterns impact bus ridership.